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Imnxiuctiori 

The error rate Mid the reject rate .ire commonly u*rd 

to describe the performance level of patter rt recognition syaicms. 
An error of mis re cognition occur* when a pattern from one 
class in identified as that of a different clsss* Th# error ift. 
Sometimes referred to -us a substitution error or undetected 
error. A reject occurs when the recognition system withhold* 
its recognition decision, and the pattern is rejected for excep- 
ts : :.: ■ . ': !-.■"■:-. c\ ','".■•. .} ,: r fa 3 I T S & C D. H O 1' JT-" ";V. "l '. L P. i p.f it;: 

Because of uncertainties and noise inherent in any 
pattern recognition task,, errors are generally unavoidable. 
The option to reject is introduced to safeguard against excessive 
mis recognition; it converts potential misrecognition into 
rejection. However* the tradeoff bc;wcen the errors and 
rejects is seldom one for one. Whenever the reject option 
is exercised, some would-be correct recognitions ar* also 
converted into rejects. We are interested in the best error- 
reject tradeoff in the optimum rejection scheme. 

An optimum rejection scheme wa* derived in Ref, 1. 
The error-reject tradeoff curves have been used to describe 
and compare the empirical performances of recognition methods. 
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(e.g. &cfs. 2. and 3)* and they have also been found useful 
in the actual system design of an optical page render {Ref- 4). 
However P few theoretics! results on the error -reject trade- 
off are available 

This paper first describes an optirrum rejection rule 
and then derives a general relation between the error and 
reject probabilities. The error rate can be directly evaluated 
from the reject function. This result prov.de s a basia for 
calculating the error rate* from th* empirical rejection curve 
without actually identifying the error A* So:T>e simple properties 
of the optimum tradeoff arc presented. Examples in normal 
distributions and uniform distribution* are given. 
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where v is the pattern vector t n is the number of cl&ssoa*, 
[p j p ? , . , - p ) is the a priori probability distribution of the 

m. far It 

classes. F{v[i} is the conditional probability density for v 

given tUo i class, <J.(i ^ 0) is the decision that v \ a Identified 

as of the i claaa while d La the decision to reject, and t iv 

o 

a constant between and 1 {0 £ t ^ 1). The probability of error, 
or error rate, is 



y 1 " n n 
E Z &(d|v)pF^ 
i-l ;=1 J 



{v|i}dv (5} 

J*i 

and the probability of reject or reject rate, is 



R(t) -X 



n 

v *(d lv) E p.F(v][)dv {6) 

i=l 



where V is the pattern space. Both the error and reject 

rates are implicit functions of the parameter t. 
The probability of correct recognition is 



CM =y v * 6 (d.MpjFCvia^* 



i=l 
1 - E(t) - R(t> (7) 



i 



': 



and the probability o£ acceptance {or acceptance rate) is 
defined as 

A(c) = C(t) t E{t). (8) 



, r 
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Re ject ion Threshold 

The parameter i an 'he dor i si on rule v-, iLL be calLcd 
"the rejection threshold". For any fixed value of t (0 £ t * 1) 
the decision rule 5 partitions the pattern space V into two 
disjoint sets (or regions) * V A ( 1 ' itn< * ^M ***** equations 
{2) and (3) respectively hold, namely: 



V ■ (t) * Mmpx [p.F(v]ij] * (1-t) F{v)3 (9) 

V B (t) = {vlmax Cp.F(v!ij] < <l-t) F(v)} <10> 

iv 1 ■ 

%vhere 

F{v) =1 p.F(vli). (H> 

1-1 S 



Without loss of generality, it will be assumed that 
F{v} is nonzero over the entire space V, otherwise the set 
over which F(v) is zero is first deleted, V and V are 
called respectively the "acceptance region" and the "reject 
region" of the decision rule- An example is depicted in 
Fig- 1(a) where the shaded region is V and the unshaded 
region is V . 

We shall now present tome simple properties of the 
rejection threshold t: 



s. 



{•a) both the en or and reject rater are mo not on ic in t, 
{!>) t is an upper bound of the error rate, and 
(c) t is a differentia] error-reject tradeoff ratio. 

(a) N&onotonicity 

It follows immediately from th.; definitions of (9) 
and (10) that for any t and t m[0 ( l] if t < t § then 



V A<V CV A«V a " d 



WVJ- 



*■ With the aid of equations {1) and (4), {9) and (10>» 
the various probabilities can be written as 

R(t)=J y {t) F(v}dv (6') 

r 

A(t)*J y (t) F<v)dv (*') 

j"' J -- 

and 

f 
C(t) = i y it) max [p^ivlO-dv (7 1 ) 

n 
Cp 4 F{vli)33dv (*') 
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All the integrands In the above integrals are non- 

negative, hence if the domain of integration expand*, 

the value of in4* gTal increases. More *pccifLcaLly, 

if ^ < t r then V A (t l3 c V A (t 2 ) and V R (y => V^); 

therefore, E(tj) * E{t 2 ) and Jt{t ) * R(t )- 

In other wortiSj E increases and; R decreases with 

increasing t. In particular, when t = 0, E = and when 

t = h A = 1 and R = 0. Whenever t * 1 - I. R - 0. 

n p 

{b) An Upper Bound a of E rr or_Ratc 

Wc shall new show that 

E(t) * t * (12) 

For any v in V {t) P we have 



Max[p.F(v|i)] * (l-t)F{v). 

i 



Therefore, . 

J__ Max[pF{vli)]dv* (1-0 Jv m*M*v 



which, with (7 1 ) and (3"}, is 

C(t) = (1-t) A(l). 
H' i ncc 

E(tJ * tA{t) * i. 
(c) is shown in the following section. 
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■ " ;' r o r - R i\j o ct i T ra de of f 

A complete description of the p#rfo> mancv of recognition 
systems is given by the error-reject tradeoff j i.e. , the functional 
relati&R*«j£ E and R at all levels- A typical tradeoff curve is given 
in Fig. 2. Since both I\ and R of the optimum recognition 
systems are tnonotonic functions of the rejection threshold 
t, one can compute the tradeoff E vs, R from E{t) and R{t). 

We shall now show that the rejection function R{tJ alone 
suffices to completely characterise the op* i mum recognition 
performance. In other words, E can be derived from R{t) f 
or from its inverse t(Ru The central result is the simple 
functional relation between E and R> namely 

PI 
fi =Jt* t{R)<IR. (13) 

! 

This relation is valid for all optimum decision tuIc* 
as defined in Equations (1) - (4). No explicit forms for the 
density functions F(v]i) are required in deriving the integral 
relation of {13). However, it will be assumed for convenience 
in the following derivation that R{\) is different tabic with 
respect :o t. Un^/.r this s.s sump: ion, tl-ir; inverse Ju^-tLuf: ti;R; 
is single-valued. However, this assumption will later be 
removed. 
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Consider an deer omental change in ihc rejrrtton Hires- 
hold from t tot - At; the reject region expands from V b (i) to 
V R {t-At). Lot fiV (t) denote the incremental region V (t-&t) - 
V (t}. For any v In AV (t), it was accepted at the threshold 
t ami is now rejected at the lower thrcshoJd t - fit, Equations 
(2) and (4} now give' 

(l-t}F(v> * Max p.F(v|i) < (1*1 + At)F{v) for v tLV {x) 

■ 1 K 

1 

(14) 

By integrating **• l*«t expression over the incremental 

region AV , one obtains 

(1-t) AR * -iC < (1 - t + it) AR (15) 

where AR and AC are respectively the increments in the 
rejection rate and correct recognition rate, namely 

r 
AR=J F(v}dv 

R 

f 
4C ■ J Av Max [p. F{v | i) jdv. 
R ■ i 

Q£ course, the increment in the error rate is simpiy 



AE = - AR - AC. 
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By substituting (16) into (15 J. one has 



-tAR s AE < - (t-At) AR {17*1 

it it it (*7b) 



Since R(t) is differentiate, AR - a* 6t - ) and (17) yields 



i£,-t4£ CIS} 

•:t ct ♦ . 

i 



By integrating (18) from t = to t, one has 
EM - E(0) - -J t |5 dt 






R(t) 

t(R)dR 



'.0) 



Since E{0) = and R(0) * 1. the above expression become* 



■/. 



E=/ B tdR W 



This relation is depicted in Fig, 3. Equation (13) can also 
be written through an integration by part* and as indicated 
in Fig. 3i a* 



■/: 



E = | R(t)dt - tR(t). (19) 
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Equation (13) gives 



-Sf.-t*. (»> 



The rejection threshold is the differential error-reject 
tradeoff. In particular, the initial slope cf the error-reject curve 
is -1 +■ — or greater while the final slope i a 0. Equation 20 also gives: 



dE = -iiO (21) 



*R Z dR 



The optimum error- reject curve is always concave 
upward and the slope increases from -i to as R Increases 
from to 1- (Figure 2) 

Although the integral relation (13) is derived under the 
assumption that R(t) is differentkabic, the assumption of 
differentiability is not essentiable- For example P it suffices 
to assume that R{t) is continuous. A proof for (13) would start 
with (17a) instead of (17b). Actually no assumption about R(t) 
is necessary for the validity of (13). R{t) is, of course, mono- 
tonic and bounded function of t and i* thus of bounded variation. 
Consequently, the Stieltjcs integral J t dR(t) always exists. 

The error-reject integral is in general 



1* 



Eft) * - f t <SRft) (22) 



To shoVfiZ) %ve first sum (17a) with t steadily increasing 
throughout the range of interest .to obtain 

- 2* *R * Eft) * «2 taR -]C flcAR 

and then let it tend to *ero- As it tends to zero the last 
sum of the above expression vanUhci and (22) results. 



• 
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Another Proof pf the Error-Rojcct Jntapra i 

Vic shall now present anothcT proof* of the cr ror-reject 
integral of (13) with a hope to provide additional insight to the 
relation:*" *Lct Mfv) denote the random variable: 

nax p.F(vli) 

M(v) = - 1 (23) 

F(v} 

M(v) is the maximum of the a posteriori probabilities of the 
classes given the pattern v. Let g(m) be the density function 
of M, In general g(m) is rather complex function of the under-* 
lying density function* F(v)i) t however, its explicit form doe* 
not concern our proof here. In terms of the variable M(v). 
the reject condition, E^. (4) becomes 

M{v) £ 1 - t (24) 

and the probability o£ reject is; 

R(t) * L gM <*™ < 25 > 



which also gives 



m dR(l r m) (26 j 



g(m) dr* 



*This proo;' is due to Mr- M* Hellxr.an and Dr. J* Raviv ot 
IBM Watson Research Center. 
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Since g{ni) is noa-rcgative r B,{i) i& a .monotonic increasing 
function of the upp-er Itovi of the integral of {25), or a mono- 
tutiic decreasing function of t. By the definition of M(v) t 
the probability of correct recognition for a given reject thres- 
hold t is 

r 1 - 

C{t> =| . ¥ m s (m}dm {2?) 



By substituting {26} in (2?) and integrating by parts, we obtain: 



C (t) =J t (1-t) dR{t) 



1 - {i-t>R{t) 



-j JR{t)dt. 



(23) 



t 



The error rate is then 



E{t) * 1 - R(t) - C{t) 

.t 

R(t)dt - tR{t) + 



■/; 



which is {19} a»d is equivalent to (13). 



',***! 
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Rej ection , Threshold of a Ml m mum Risk Rule 

It is known that the optimum decision rule given in 
Equations (1) to (4) is also a minimum risk tuIc if the cost 
function is uniform within each class of decisions, i. e. if no 
distinction i* made among the crrore'j among the rejects r 
and amonp the correct recognition. The rejection threshold 
is then related to the costs as follows: 

W ~ W 

t* - - (29) 

1 W - W 
e c 

where W . W , and W are the cost* for riaking an error, 
e r c 

reject and correct recognition respectively- Usually W > 

W > W * The rejection threshold is simply the normalized 
r c 

cost for the rejection. We can take W^ = and W^ = 1. ^d 

the irtirtinsum risk it 

■ 

-* 

Risk (t) = E(t) + tR(t) = I R(t) dt (30) 



which is also depicted in Fig. J. 



... - . . 
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Ex ample g 

For numerical illustration., two examples are given 

here. Ii> these examples^ Che pattern vector v is one -di men* 

sional and there are two pattern classes with equal apriori 

probability of occurrence, i.e. p, = p = — -. The examples 

i t c 

respectively are concerned with the normal distributions and 
uniform distributions. 

For two classes, the condition for rejection, namely 
Eq* (4) can never be satisfied when t > -— » hence the reject 

fa 

rate is always zero if the reject threshold, t, exceeds — # The 

effective range oft for two classes of problems is, therefore, 

1 1 

from to -— . With n - 2 and * t * — -, it can. be readily 

verified that the condition for rejection, (4) is equivalent to 



_i- s — i ^-— - & — (31) 



(l) Normal Distributions of Equal Variance 

Consider two normal distribution! with means |i. and 
[i and equal covarianc* o {Fig. 4). Take |JL > U - The 

fa X ■£ 

density function is, i = 1 or 2, 

F{vj i> * -±— **p [ 1 ] (32) 
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Tyii.li (31) and somn algebraic manipulations, (32} can 
be trans/ormed to 



|v--f K +HJl * — *n{^-l} (33) 

* u - u 



i + e, , the optimum rule is to reject whenever the pattern lies 
within a certain distance of the midpoint between the two 
means. The corresponding error and reject rate* are^ 

E(t) = i {a) {34a) 

R<t) = * (b) i t (a) (34b) 

where * is the aorpnal cumulative diitribution function, namely 
... 1 - ^ dx (35) 



- co 



and the parameters are 



a * . -i- s - J-tti fi— 1) ' (a) 

i a . t 



- 

b , . X fl + i_ £ n (4— 1) (b) (36) 



S -B* 01 - >1 )/CT {C} 
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The parameter a i B < the (norrrj&lifced) separation between 
the means of the distribution* and is the on'y (composite) 
parameter of the distributions that R(tJ and E{t) depend upon. 
It is straightforward to verify (18) and hence (13) for this 
example, A set of the error, reject t and tradeoff curve* 
{for s ■ 1, 2, 3, and 4) is depicted in Fig. 5, 
2- Uniform distributions 

Consider two uniform distributions: 



F(vll) 



1 when * v * I 
elsewhere 



07a) 



F{vU) * 



1 1 + < 5 

-^- when *—* £ v s -5- 

£■ £ -t* 

elsewhere 



(37b) 



which ara .-:ju;v.'!: in. Flo. £.. 

The reject function R{t) U simply: 



RW = 



when Oits-r 



when— < t * 1 



(35) 



, 1 



which is discontinuous (Fig, 7a) and the integral of (21) is 



evaluated to 
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when * t * ~- 

■— when— < t * 1 
8 3 



which is shown in Fig. [7b) and the tradeoff can assume only 
two values, namely (E r Rj * (— . 0) or {0, — ) (Fig. 7c). How- 
ever, if a randomised scheme Lb used in the range rr- * v £ 1, 
R may vary continuously from— to as shown by the dotted 
line in Fig. {7*)< 



• 
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Some Practical Implications 

Most of our results on the error -reject tradeoff seem 
consistent with our intuition, although the simple integral 
relation between the error and reject rates is somewhat unex- 
pected. Those results have some practical implications are 
are useful in system design and performance evaluation. 

Since the slope of the error-reject tradeoff curve (Fig* 2} 
is the value of the rejection threshold, the tradeoff is most 
effective initially (i.e. at the low level of rejection) and it gets 
harder as the error rate: is lower* This it certainly common in 
our practical experience; excessive rejection is generally 
required to reduce residual errors. 

Practical applications of the present results are in the areas 
of system design and performance evaluation of the recognition 
systems. The general characteristics of the error -reject trade- 
off curve provide the system designer a convenient meant of 
verifying the basic assumption on the underlying probability 
distributions. The integral o£ (13) makes it possible to calculate 
error rates, and consequently the tradeoff curve from the empir- 
ically observed reject rates* No class identification of the sample 
patterns are required in obtaining the empirical rejection 
curve, j Ox equivalently one can just obtain an empirical density 
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function*'*!* -the maximum of the a posteriori probabilities, and 

*■ 

then calculate the error and reject rates using equations (25) 
ami £27)* 

In most recognition tasks, the underlying probability 
distributions of the patterns are not completely known and the 
design of the recognition systems is generally based on empirical 
•data. A common design procedure is to assume, on the basis 
of available (usually limited) a priori information and the designer's 
intuition, some functional forms of the distributions and to derive 
the system structure based on these assumptions and to adjust 
the system parameter* by using the empirical data* It is not 
always a simple matter to verify the validity o£ the assumptions 
on which the system structure is based. Mowever, one can 
always, though laboriously p obtain the empirical error -reject 
tradeoff curve and compute the theoretical one from the basic 
assumptions. A comparison of the empirical and theoretical 
tradeoff curves can quickly reveal how well the theoretical model 
agrees with the empirical data and serve as a checkpoint for 
initiating the process of revising and improving the theoretical 
model* 

The data used in. any meaningful evaluation of a recog- 
nition system is usually large and it is extremely costly and 
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Cor*c Lu sion 

A general error and reject tradeoff relation is derived 
for the (Baye a} optimum recognition system with an option lo 
reject. "The error probability is a SticlLjos integral of the 
rejection threshold with respect to the reject probability* 
The error function can be directly evaluated from the reject 
function. Hence, the reject function determines the recog- 
nition error and reject tradeoff ^n6 completely characterizes 
the performance of the optimum recognition system* 

Same practical implications in the system design and 
performance evaluation of the recognition systems are dis- 
cussed* The error- reject Integral provides a simple means 
of calculating the error rate from the empirical reject curve 
without actually identifying the recognition errors. 
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time consuming to detect the recognition errors. To identify 

a recognition error, additional information usually human 

J . ' 
inspection at some atupc, is required. On the other hand, 

the rejection is the explicit result of a definite decision, 

and the rejects can be readily recorded ard tallied. Equation 

{11) provides a simple means of calculating the error rate 

from the reject curve without actually identifying the trrore. 
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FIGURE CAFnONS 

L Reject Regions in the Pattern Space 

2. Err or -Reject Tradeoff Curve 

3. Reject Curve 

4. Example in Normal Distribution 

5. Npa'jaial Distributions; (a) Reject and Error Curve* 
{b) Tradeoff Curve 

6. Example in Uniform Distribution* 

7. Uniform Distributions: (a) Reject Curve {b} Error 
Curve (c) Tradeoff Curve 
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Figure 4 
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Figure 7 



